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Abstract 


Several  algorithms  are  presented  for  solving  linear  least  squares 
problems;  the  basic  tool  is  orthogonalization  techniques.  A  highly 
accurate  algorithm  is  presented  for  solving  least  squares  problems  with 
linear  inequality  constraints.  A  method  is  also  given  for  finding  the 
least  squares  solution  when  there  is  a  quadratic  constraint  on  the 
solution. 


0.  Introduction 


One  of  the  most  common  problems  in  any  computation  center  is  that  of 
finding  linear  least  squares  solutions.  These  problems  arise  in  a  variety 
of  areas  and  in  a  variety  of  contexts.  For  instance,  the  data  may  be 
arriving  sequentially  from  a  source  and  there  may  be  some  constraint  on 
the  solution.  linear  least  squares  problems  are  particularly  difficult 
to  solve  because  they  frequently  involve  large  quantities  of  data,  and 
they  are  ill-conditioned  by  their  very  nature. 

In  this  paper,  we  shall  present  several  numerical  algorithms  for 
solving  linear  least  squares  problems  in  a  highly  accurate  manner.  In 
addition,  we  shall  give  an  algorithm  for  solving  linear  least  squares 
problem  with  linear  inequality  constraints. 


1.  Linear  least  squares 


Let 

vector. 


A  be  a  given  mxn 
We  wish  to  determine 
m  n 
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real  matrix  of  rank 
x  such  that 

=  min. 
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and  b  a  given 


or  using  matrix  notation 


Ilfe-Alfllg  =  min.  (l.l) 

If  m  >  n  and  r  <  n  ,  then  there  is  no  unique  solution.  Under  these 
conditions,  we  require  amongst  those  vectors  x  which  satisfy  (l.l)  t  lat 


min. 


For  r  =  n  ,  x  satisfies  the  normal  equations 
T  T 

A  Ax  =  A  b  .  (1.2) 

T 

Unfortunately,  the  matrix  A  A  is  frequently  ill-conditioned  and 
influenced  greatly  by  roundoff  errors.  The  following  example  illustrates 
this  well.  Suppose 
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which  is  clearly  of  rank  1  .  Then 
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and  the  eigenvalues  of  A  A  are  A+e  ,  £  ,  e  ,  e  .  Assume  that  the 

T 

elements  of  A  A  are  computed  using  double  precision  arithmetic,  and  then 
rounded  to  single  precision  accuracy.  Now  let  t)  be  the  largest  number 
on  the  computer  such  that  fi(l+q)  =  1  where  indicates  the 

floating  point  computation.  Then  if  t  <  /i\  , 


f  l  (ATA) 
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a  matrix  of  rank  one,  and  consequently,  no  matter  how  accurate  the  linear 
equation  sol-er  it  will  be  impossible  to  solve  the  normal  equations  (1,2). 

IjONGIHY  [  1 9* I? ]  has  given  examples  in  which  the  solurion  of  the  normal 
equations  leads  to  almost  no  digits  of  accuracy  of  the  least  squares  problem. 


A  matrix  decomposition 


Mow  i|yj|0  =  (yTy)1//‘“  so  that  |!Qyi|0  =  lly!l0  when  Q  is  an  orthogonal 
matrix. 


T 

. ,  A  A  =  I  .  Thus 


w  i .  f ;  ro 


^-Axil2  =  il<:-QAx||0 

A-  and  0  is  an  orthogonal  matrix.  We  choose  Q  so  that 

~  1 
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) (m-n) 


xn 


QA  -  R  -= 


0 


(2-1) 


where  R  is  an  upper  triangular  mrtrix  (^J).  Let, 


11  12  '  * 


lb-Axjg  =  (c  -r  x  -r  x  - 


^1  111  12  2 

+  (c2-r22X2  ‘  "  r2nXr/ 

/  \2 

+  . . .  +  (c  -r  x 

v  n  nn  n' 

2  2  2 

n+1  n+2  m 


r  x  )‘ 
In  n 


Thus  |(b-Ax|[p  is  minimized  when 


rllXl  +  ri2X2  +  *  * '  +  rinXn  “  C1 
r22X2  +  **•  +  r2nxn  =  °2 


r  x  =  c 
nn  n  n 


i.e.,  Rx  =  c,  where 


T 

f  =  (cl,c2’ * ’ *,Cn^  ’ 


'MXI12  =  Cn+1  +  Cn+2  +  *  *  *  +  C; 


rtr  =  [r:o]t[r:o]  -  rtr 

•  • 

=  [QAjT[QA]  =  ATA  , 

and  thus  RXR  is  simply  the  Cholesky  decomposition  of  A^A  . 

There  are  a  number  of  ways  to  achieve  the  decomposition  of  (2.l); 
e.g.  on"  could  apply  a  sequence  of  plane  rotations  to  annihilate  the 
elements  below  the  diagonal  of  A  .  A  very  effective  method  to  realize 


5 


the  decomposition  (2.l)  is  via  Householder  transformations.  A  matrix  P 
is  said  to  be  a  Householder  transformation  if 


T  T 

P  =  I  -  2uu  ,  u  u  =  1 


Note  that  l)  P  *  PT  and  2)  PPT  =  I  -  2uuA  -  2uuT  +  UuuTuu*  =  I  so 
that,  P  is  a  symmetric,  orthogonal  transformation. 

Lot  A^  =  A  and  let  A^  ,  A^  ,...,  A^n+"^  be  defined  as  tollows: 


A(k+I)  =  p(k)A(k) 


(k  "  1,2, ... ,n) 


where  P 


„  I  -  ,  w<*»>  =  1  . 


The  matrix  is 


.  ,,  .  (k+l)  (k+l)  (k+l)  - 

chosen  so  that  a^R  -  aJ+2>'  =  •••  =  =  0 

transforms t ions 

a(2)  a(2) 
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Thus  after  k 
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In 
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mn 


L'ote  that  |a^'+i')|  =  (£  (a)^)2)1^ 

1  kk  V4-i=k  '  ik 


since  P 


(k) 


is  an  orthogonal 


' i=k  ik 

transformation.  The  details  of  the  computation  are  given  in  BUSINGER  and 
GOLUP  [19^51  and  GOLUB  [1965].  The  Householder  transformations  have  been  used 
in  a  hiehly  effective  manner  by  KALFON  et  al.  [1968]  in  the  implementation 
of  the  pro,]< -at ion  gradient  method. 

Clearly 
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and 

Q  =  P(n)P(>l)  ...  P(l) 

although  one  need  not  compute  Q  explicitly.  The  number  o:'  mult. ipliont  1  ons 
required  to  produce  R  is  roughly  ian~-(iP/3)  whereas  approximately 

O 

multiplications  are  required  to  form  the  normal  equations  (l..:) . 


The  practical  procedure 


It  is  Known  that  the  Cholesky  method  for  solving  systems  of  ••quntiona 

is  numerically  stable  even  if  no  interchanges  of  rows  and  ;olumns  arc 

performed.  Since  we  are  in  effect  performing  a  Cholesky  decomposition 
T 

of  A  A  no  interchanges  of  the  columns  of  A  are  needed  in  most 

situations.  However,  numerical  experiments  have  indicated  that  the 

accuracy  is  slightly  improved  by  the  lnt.erch.ange  strategies  outlined 

below,  and  consequently,  in  order  to  ensure  the  utmost  accuracy  one 

should  choose  the  columns  of  A  by  some  strategy.  In  what  follows, 

(k'i 

we  shall  refer  to  the  matrix  Av  '  even  if  some  of  the  columns  have 
been  interchanged. 

One  possibility  is  to  choose  at  the  stage  the  columns  of 

A^k'  which  will  maximize  |a^k+1'i  .  This  is  equivalent  to  searching  for 

T 

the  maximum  diagonal  element  in  the  Cholesky  decomposition  of  A  A  . 
het 


m 


E( 

j=* 


aW 


for  j  =  k,k+l, . . . ,n  . 


Then  since 

.00 


( k+ 1 )  |  ,  (k)  ,l/2  .  ..  ,  .,  , 

i  |  =  (s,  )  ,  one  should  choose  that  column 

KK  K  /,  ,  \ 

is  maximized.  After  A'  has  been  computed,  one  can 


sjk+l^  as  follows: 

A1"0  .  s<k)  - 

j  k,  j 


(.1  =  k+l, . . .  ,n) 


for  which 
compute 


since  tiie  orthogonal  transformations  leave  the  column  lengths  invariant. 
Naturally,  the  s9‘^  ’s  must  be  interchanged  if  the  columns  of  A^  are 
interchanged. 
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The  above  strategy  is  useful  in  determining  the  rank  of  a  matrix. 
If  the  rank  of  A  is  r  and  the  arithmetic  is  performed  exactly,  then 


after  r  transformations 


(n-r)xr 


in-r ;x 
iJ 


s(r+l)  .  0 
J 


j  =  r+i, . . .,n 


w);ich  implies  N  =  0  .  In  most  situations,  however,  where  rounded 
arithmetic  is  used  ||n||  =  c  .  It  is  not  easy  to  determine  bounds  on  £ 
when  the  rank  of  A  is  unknown. 

The  strategy  described  above  is  most  appropriate  when  one  has  a 
sequence  o'  vectors  b^,b0, . . . ,b  for  which  one  desires  a  least  squares 
estimate.  In  many  problems,  there  is  but  one  vector  b  and  one  wishes 
to  express  it  in  as  few  columns  of  A  as  possible.  Or  more  precisely, 
one  wishes  to  determine  the  k  indices  such  that 


v"1  ^  a  2 

I  v'bi  -  Z  v  *i  r  =  min- 

i=l  U=1  ' 0 


cannot  solve  this  problem,  but  we  shall  show  how  to  choose  index  k 
when  the  first,  k-1  indices  are  given  so  that  the  sum  of  squares  of 
residuals  is  maximally  reduced.  This  is  the  stage-wise  regression  problem. 
We  define 


b  and  c<k+l)  =  P(k>Jk>  .  Now  rOO^X'D  . 


where 


„  v  ;• _  j  ) 

x  '  is  tne  least  squares  estimate  based  on  (k-l)  columns  of  A  and 
'  "  =  C,'jk'*,c^k\...,c^k^)  .  Thus  by  (2.2) 


m 


z  (cSk+1)>? 


j=k+l 
m 


=  E< 

j=k 
m 


(K+l)}2  _  ,  (k+l)^2 


k 

(kK2  _  ,  (k+lk2 


r 


=  Z  -  ( 

j=k  J 

since  length  is  preserved  under  an  orthogonal  transformation.  Consequently, 
we  wish  to  choose  that  column  of  A^  which  will  maximize  |c^+'*"^|  . 

Let 


t(k)  =  (TaiVc^)  for  j  =  k+1, , . .  ,n 
J  i=k  1 


m 


.(k)  (k), 


Then  since  |c£k+1^|  =  I  (^?=k  ai^c^  )/ I  >  one  should  choose  that 

column  of  for  which  (t  )2/s  ^  is  maximized.  After  is 

J  J 

fk)  (k) 

applied  to  A v  ,  one  can  adjust  t.  '  as  follows: 

J 

t(k+l)  _  t(k)  _  a(k+l)c(k+l) 

i  ~  i  kj  k 

(k)  2  fk) 

In  many  statistical  applications,  if  (t.  )  /s.  is  sufficiently  small, 

0  0 

then  no  further  transformations  are  performed. 


4.  Statistical  calculations 

In  many  statistical  calculations,  it  is  necessary  to  compute  certain 

T 

auxiliary  information  associated  with  A  A  .  These  can  readily  be  obtained 
from  the  orthogonal  decomposition.  Thus 

m  2 

det (A  A)  =  (rn  x  r22  X  ...  X  rnn) 

Since 

ata  =  rtr  ,  (ata)_1  =  r"lr"t 

The  inverse  of  R  can  be  readily  obtained  since  R  is  an  upper  triangular 
matrix.  It  is  possible  to  calculate  (A  A)  directly  from  R  .  Let 
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(A  A)  _  X  -  (x^, Xg, . . . , )  . 

Then  from  the  relationship 


and  by  noting  that  {R  }  =  l/r^  ,  it  is  possible  to  compute  xn>xn  i>’*',xi  * 

The  number  of  operations  is  roughly  the  same  as  in  the  first  method  but 
mere  accurate  bounds  may  be  established  for  this  method  provided  all  inner 
products  are  accumulated  to  double  precision. 

In  some  applications,  the  original  set  of  observations  are  augmented 
by  an  additional  set  of  observations.  In  this  case,  it  is  not  necessary 
tc  begin  the  calculation  from  the  beginning  again  if  the  method  of 
orthogonalization  is  used.  Let  R^,c^  correspond  to  the  original  data 
after  it  has  been  reduced  by  orthogonal  transformations  and  let  Ag,bg 
correspond  to  the  additional  observations.  Then  the  up-dated  least  squares 
solution  can  be  obtained  directly  from 


—  _ 

A  = 

A2 

0  m  0 

,  b  = 

52 

•  •  • 

!i_ 

This  follows  immediately  from  the  fact  tint  the  product  of  two  orthogonal 
transformations  is  an  orthogonal  transformation. 

The  above  observation  has  another  implication.  One  of  the  arguments 
frequently  advanced  for  using  normal  equations  is  that  only  n(n+l)/2 
memory  locations  are  required.  By  partitioning  the  matrix  A  by  rows, 
however,  then  similarly  only  r»(n+l)/2  locations  are  needed  when  the 
method  of  orthogonalization  is  used. 

In  certain  statistical  applications,  it  is  desirable  to  remove  a  row 
of  the  matrix  A  after  the  least  squares  solution  has  been  obtained.  This 
can  be  done  in  a  very  simple  manner.  Consider  the  matrix 


-  _ 

R 

c 

. 

and  d  = 

_*•  ?  J 

i.  P 

wl.ere  a  is  the  row  of  A  which  one  wishes  to  remove,  p  is  the  corresponding 
element  of  b  ,  and  j  =  /-l  .  Note  that 


8 


=  S  ,  and  n+1s^^ 

(2) 

We  choose  cos  0  so  that  {Sv  '}  ,  =  0  .  Thus 

=  (r]_^rij_Q:iaj  )/v^(z'ii"Q:^)  j  =  2,5, ...,n 
ts^  'in+l,j  =  1^airij"ajrilV'/r<'rll'ai^  J  =  2,5,... ,n 


Note  no  complex  arithmetic  is  really  necessary.  The  process  is  continued 
as  follows: 

Let 


k  n+1 


9 


Then 


S(k+T) 


=  Z. 

k,n+l 


(k) 


1c  —  1,2, ... ,n  , 


and  cos  ©  is  determined  so  that  =  0  .  Thus  roughly  3n^ 

iC  A  ,  il+  X 

multiplications  and  divisions  and  n  square  roots  are  required  to  form  the 
new  R  . 


Suppose  it  is  desirable  to  add  an  additional  variable  so  that  the 
matrix  A  is  augmented  by  a  vector  g  (say).  The  first  n  columns  of 
R^n  are  unchanged.  Now  one  computes 


h  =  ...  g  . 

From  h  one  can  compute  P^n+1^  and  apply  it  to  p^n^. .  .p^^b  .  This 
technique  is  also  useful  when  an  auxiliary  serial  storage  (e.g.  magnetic 
tape)  is  used. 

It  is  also  possible  to  drop  one  of  the  variables  in  a  single  fashion 
after  R  has  been  computed.  For  example,  suppose  we  wish  to  drop 
variable  1  ,  then 


By  using  plane  rotations,  similar  to  those  given  by  (4.l),  it  is  possible 
to  reduce  R  to  the  triangular  form  again. 


5.  Gram-Schmidt  orthogonalization 
In  §2,  it  was  shown  that  it  is  possible  to  write 


QA  =  R 


(5.1) 


The  matrix  Q  is  constructed  as  a  product  of  Householder  transformations. 
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From  (5.l),  we  see  that 

A  -  QTR  =  PS 
T 

where  P  P  *  I  ,  S  :  ^  .  Each  row  of  S  and  each  column  of  P  is 

uniquely  determined  up  to  a  scalar  factor  of  modulus  one.  In  order  to  avoid 

computing  square  roots,  we  modify  the  algorithms  so  that  S  is  an  upper 

T 

triangular  matrix  with  ones  on  the  diagonal.  Thus  P  P  =  D  ,  a  diagonal 
matrix.  The  calculation  of  P  and  S  may  be  calculated  in  two  ways. 

a)  Classical  Gram-Schmidt  Algorithm  (CGSA) 

The  elements  of  S  are  computed  one  column  at  a  time.  Let 

a(  ^  =  f2l»52»,,,»5k-l,£k,*",Sn^ 

and  assume 

Bi  Bj  -  &ijdi  *  1  <  *->3  <  k-L  • 

At  step  k  ,  we  compute 

Sik  =  (b!  W  ’  1  <  i  < 

k-1  2 

5k  *  fk  ■  Asik  Si  »  • 

1=1 


b)  Modified  Gram-Schmidt  Algorithm  (MGSA) 

Here  the  elements  of  S  are  computed  one  row  at  a  time.  We  define 

AW  „  v  v  a00\ 

and  assume 

Si  Sj  -  6ijdi  ’  Pi  -  0  ’  1  <  i, J  <  K-l  ,  k  <  i  <  n  . 

(k ) 

At  step  k  ,  we  take  p^  =  a^  ,  and  compute 

\  “  kJl  >  sk i  -  (h  >  !ik+1)  -  !ik)'8k/  Bk  »  k+1  S  1  <  n 


li 


In  both  procedures,  s  =  1  .  The  ■cwo  procedures  in  the  absence  of 

KK 

roundoff  errors,  produce  the  same  decomposition.  However,  they  have 
completely  different  numeric? 1  properties  when  n  >  2  .  If  A  is  at  all 
"ill-conditioned",  then  using  the  CGSA,  the  computed  columns  of  P  will 
soon  lose  their  orthogonality.  Consequently,  one  would  never  use  the 
CGSA  without  reorthogonalioation,  which  greatly  increases  the  amount  of 
computation.  Recrthogonalization  is  never  needed  wnen  using  the  MGSA. 

A  careful  roundoff  analysis  is  giver,  by  BJORK  [1967].  RICE  [1966]  has 
shown  experimentally  that  the  MGSA  produces  excellent  results. 

The  MGSA  has  the  advantages  that  it  is  relatively  easy  to  program, 
and  experimentally  (of.  JORDAN  [1968]),  the  least  squares  solution  seems 
to  be  slightly  more  accurate  than  the  Householder  procedure.  However,  it 
requires  roughly  mn  / 2  operations  which  is  slightly  more  than  that  necessary 
in  'he  Householder  procedure.  Furthermore,  it  is  not  as  simple  as  the 
Householder  procedure  to  add  observations,  and  the  vectors  generated  by  the 
Householder  procedure  are  more  nearly  orthogonal  than  those  generated  by  MGSA. 


6.  Sensitivity  of  the  solution 

We  consider  first  the  inherent  sensitivity  of  the  solution  of  the 
least  squares  problem.  For  this  purpose  it  is  convenient  to  introduce  the 
condition  number  k(a)  of  a  non-square  matrix  A  .  This  is  defined  by 

<■  (A)  «  <7  /0  ,  a  =  max  |[Ax||  /  ||x||0  ,  a  .  min  ||Axj|0  /  ||x|| 

in  1  x^0  ~  2  ~  n  x^O  ~  2  ~  2 

2  2  T 

sc  that  and  an  are  the  greatest  and  the  least  eigenvalues  of  A  A  . 

From  its  definition  it  is  clear  that  k(a)  is  invariant  with  respect  to 

unitary  transformations.  If  R  is  defined  as  in  (2.1)  then 

a^R)  -  aL(A)  ,  oJr)  -  ojA)  ,  *(R)  -  *(A)  , 

while 

aL(R)  =  ||R||2  and  o^R)  =  1/  ||R-1||2  . 

The  commonest  method  of  solving  least  squares  problems  is  via  the  normal 
equations 

ATAx  =  ATb  .  (6.1) 
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T 

The  uatrix  A  A  is  square  and  we  havi 
k(ATA)  =  k2(A) 

•  /2 

This  means  that  if  A  has  a  condition  number  of  the  order  of  2"'  then 

T  t 

A  A  has  a  condition  number  of  order  2  and  it  will  not  be  possible 

using  t-digit  arithmetic  to  solve  (6.1).  The  method  of  orthogonal 

transformations  replaces  the  least  squares  problem  b,v  the  solution  of 

the  equations  Rx  =  e  and  k (R)  =  . (A)  .  I1  would  therefore  seem  to  have 

substantial  advantages  since  we  avoid  working  with  a  matrix  witn  condition 
2 

number  k  (A)  . 

We  now  show  that  this  last  remark  is  an  oversimplification.  To  this 
end,  we  compare  the  solution  of  the  original  system  [A  .'  b]  with  that  of 
a  perturbed  system.  It  is  convenient  to  assume  that 


this  is  not  in  any  sense  a  restriction  since  we  can  make  |{a||2  and  ||b||g 
of  order  unity  merely  by  scaling  by  an  appropriate  power  of  two.  We  now 
have 

k( A)  =  #c(5)  -  ||R-1||2  =  l/an  . 

Consider  the  perturbed  system 

(A  +  eE  j  b  +  Ee)  ,  l|E|j2  =  \\e\\2  =  1  , 

where  e  is  to  be  arbitrarily  small.  The  solution  x  of  the  perturbed 
system  satisfies  the  equation 

(A  +  eE)T(A  +  eE)x  =  (A  +  sE)T(b  +  te)  .  (6.2) 

If  x  is  the  exact  solution  of  the  original  system  and  Q  is  the  exact 
orthogonal  transformation  corresponding  to  A  we  have 


QA  = 

R 

•  •  • 

,  Q(A  +  eE)  ■ 

R  +  eF 

,  Qe  e 

4' 

Jl 

•  ■  • 

0 

eG 

g 

and 

r  =  t-Ax  ,  ATr  =  9 


13 


Equation  (6.2)  therefore  becomes 


(A  +  eE)T(A  +  eE)  =  (AT  +  eE1 )  (Ax  +  r  +  ee) 


giving 


f  n 

I  R  +  eF 

T 

r~ 

R  +  eF 

X  = 

R  +  eF 

T 

( 

r  ~1 

R 

x  +  e 

— 

f 

) 

L  *  J 

eG 

eG 

[ 

0 

_  «■ 

-5. 

J 

Neglecting  where  advantageous. 


(R  +  eF)^(R  +  eF)x  =  (R  +  eF)~  Rx  +  e (R  +  eF)T  f  +  eET  r  +  0(e2) 

x  =  (1  +  eF)'1  Rx  +  e(K  +  eF)_1  f  +  e(RTR)-1  tfr  +  0(e2) 
=  x  -  eR"1  Fx  +  eR-1  f  +  E(rl'5)"1  ETr  +  0(e2) 


ix-x||2  < 


e||B"1H2!i^ll2l|x||2 


2ilf!!2  +  ells' 


+  eilR' 

<  e«-(A)||x||0  +  ek(  A)  +  E/c‘"(A)||rll2  +  0(e‘i) 


In  2 
2 


2  IIeII2!MI2  +  o(e2) 


We  observe  that  the  bounds  include  a  term  ck  (A)||r||g  .  It  is  easy  to 
verify  by  means  of  a  3x2  matrix  A  that  this  bound  is  realistic  and 
that  an  error  of  this  order  of  magnitude  does  indeed  result  from  almost 
any  such  perturbation  E  of  A  .  We  conclude  that  although  the  use  of 
the  orthogonal  transformation  avoids  some  of  the  ill  effects  inherent  in 
tlie  use  of  the  normal  equations  the  value  x  ( A)  is  still  relevant  to  some 


extent . 

When  the  equations  are  compatible  ||r||0  =  0  and  the  term  in  x2(A) 

disappears.  In  the  non-singular  linear  equation  case  r  is  always  null 

2  ~ 

and  hence  it  is  always  x( A)  rather  than  k  (a)  which  is  relevant. 

Since  the  sensitivity  of  the  solution  depends  on  the  condition  number, 
it  is  frequently  desirable  to  replace  the  original  unknowns  x  by  a  new 
vector  of  unknowns  D  ''‘x  where  T)  is  a  diagonal  matrix  with  non-zero 
diagonal  elements.  Thus  we  wish  to  find  y  for  which 


where  C  =  AD  and  y  =  D'^x  .  Let  be  the  set  of  all  n  x  n  diagonal 

matrices  with  non-zero  diagonal  elements.  We  wish  to  choose  D  so  that 

k(AD)  <  k(AD)  for  all  Dofi^ 

Let  De£n  and  {D}li  =  l/||ail|0  .  VAN  DER  SLU1S  [  1968]  has  shown  that 
0 

k(AD)  < /n  <(AD) 

Therefore  in  the  absence  of  other  information,  it  would  appear  that  it  is 
best  to  precondition  the  matrix  A  so  that  all  columns  of  the  matrix  A 
have  equal  length.  In  practice,  one  adjusts  the  exponents  of  the  stored 
elements  of  A  so  that  the  mantissa  of  the  floating  point  representation 
is  not  changed. 


7 .  Iterative  refinement  for  least  squares  problers 

The  iterative  refinement  method  may  be  used  for  improving  the 
solution  to  linear  least  squares  problems.  Let 


Qp  =  b-Ax 


a  >  o 


so  that 


T  T  T  ~ 

OA  p  =  Ab-AAx  =  0 


When  a  =  1  ,  the  vector  p  is  simply  the  residual  vector  r  .  Thus 


ai 


b 

0 


(7.1) 


or 


Cy  =  g 


One  of  the  standard  methods  for  solving  linear  equations  may  now  be  used 
to  solve  (7.1).  However,  this  is  quite  wasteful  of  memory  space  since  the 
dimension  of  the  system  to  be  solved  is  (nn-n)  .  We  may  simplify  this 
problem  somewhat  by  noting  with  the  aid  of  (2.3)  that 
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LU  . 


(7.2) 


I 

ai  | 

A 

1 _ 

/a  1 

0 

„T 

A 

1  ° 

75  A? 

J_  rT 

/a 

/a  I 

f  A 
/a 

0 

1 

1 

Once  an  approximate  solution  to  Cy  =  g  has  been  obtained,  it  is 
frequently  possible  to  improve  the  accuracy  of  the  approximate  solution. 

Let  y  be  an  approximate  solution,  and  let  v  »  g-Cy  .  Then  if  y  =  y+b  , 
satisfies  tne  equation 


Cb  =  v 


(7-3) 


Kqur.  /  on  (7.3)  can  be  solved  approximately  from  the  decomposition  (7.2).  Of 
course,  it  is  not  possible  to  solve  precisely  for  b  so  that  the  process 
may  be  repeated. 

V.'e  are  now  in  a  position  to  use  the  it-erative  refinement  method 
(of.  MOLER  [1967],  WILKINSON  [1967])  for  solving  linear  equations.  Thus  one 
might  proceed  as  follows: 

l)  Solve  for  x^°^  using  one  of  the  orthgonalization  procedures  outlined 
in  f‘  2  or  5*  R  must  be  saved  but  it  is  not  necessary  to  retain  Q  .  Then 

o(o)  -±  (b-A:>>)  . 

a  ~  ~  ' 


'.}  The  vector  y^S+^  is  determined  from  the  relationship 

( s+ 1 )  ( s )  ( s ) 

y  »  y  +b 


where 


CS(S)  =  g-Cy(s)  S  V(S) 


(7-M 


This  calculation  is  simplified  by  solving 


I;.(*0  =  v(s) 
u:.  (s)  =  „(s) 

(s) 

The  vector  vs  ’  must  be  calculated  using  double  precision  accuracy  and 
then  rounding  to  single  precision. 
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A  s )  ( s ) 

3)  Terminate  the  iteration  vhen  1'|r.As  ji  /  jjy^“/]l  is  less  Minn  .. 
prescribed  number. 

Note  that  the  computed  residi.ial  vector  is  an  approximat  ion  to  the 
residual  vector  when  the  exact  solution  x  is  known.  This  may  diiier 
from  the  residual  vector  computed  from  the  approximate  solution  to  the 
least  squares  problem. 

There  are  three  sources  of  error  in  the  process:  (l)  computation 

Cs) 

of  the  vector  v'  ’  ,  (2)  solution  of  the  system  of  equations  for  the 

(s) 

correction  vector  ‘  ,  and  (3)  addition  of  the  correction  vector  to 

~  (s) 

the  approximation  y  .  It  is  absolutely  necessary  to  compute  the 

(s\ 

components  of  the  vector  v  '  using  double  precision  inner  products  and 
then  to  round  to  single  precision  accuracy.  The  convergence  of  tne  iterative 
refine,  it  process  has  been  discussed  in  detail  by  MOLER  [196?].  Generally 
speaking,  for  a  large  class  of  matrices  for  k  >  kQ  all  components  of  y'0,/ 
are  the  correctly  rounded  single  precision  approximations  to  she  components 
of  y  .  There  are  exceptions  to  this,  however,  (cf.  KAHAN  [i960]). 
Experimentally,  it  has  been  observed,  in  most  instances,  that  if 

/  lly(o  IL  ^ 2_p  where 
ML  =  Kl 

l<i<n 

then  kQ  >  [t/p]  .  We  shall  return  to  the  subject  of  iterative  refinement 
when  we  discuss  the  solution  of  linear  seast  squares  problem  with  linear 
constraints. 

A  variant  of  the  above  procedure  has  been  analyzed  by  BJ&RCK  [1967b], 
[1968],  and  he  has  also  given  an  ALGOL  procedure.  This  has  proved  to  be 
a  very  effective  method  for  obtaining  highly  accurate  solutions  to  linear 
least  squares  problems. 


8.  Least  squares  problems  with  constraints 

Frequently,  one  wishes  to  determine  x  so  that  ||b-Ax(|g  is  minimized 
subject  to  the  condition  that  Gx  =  h  where  G  is  a  pxn  matrix  of  rank  p  . 
One  can,  of  course,  eliminate  p  of  the  columns  of  A  by  Gaussian  elimination 
after  a  pxp  non- singular  submatrix  of  G  has  been  determined  and  then  solve 
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the  resulting  normal  equations.  This,  unfortunately,  would  not  be  a  numerically 
stable  scheme  since  no  row  interchanges  between  A  and  G  would  be  permitted. 

If  one  uses  Lagrange  multipliers,  then  one  must  solve  the  (n+p)x(n+p) 
system  of  equations. 


— 

T 

A  A 

Ho 

— 

X 

at. 

A  D 

G 

0 

b 

SS 

h 

- 

- 

L  _ 

— 

(8.1) 


T  -IT  t  -IT 

where  \  is  the  vector  of  Lagrange  multipliers.  Since  x  =  (A  A)  A  b-(A  A)"V^  , 
G(ATA)-1  GT  K  =  Gz-h 


where 


z  =  (ATA)-1  AT  b 


Note  z  is  the  least  squares  solution  of  the  original  problem  without 
constraints  and  one  would  frequently  wish  to  compare  this  vector  with  the 
final  solution  x  .  The  vector  z  ,  of  course,  should  be  computed  by  the 
orthogonalization  procedures  discussed  earlier. 

Since  AJA  =  RTR  ,  G(ATA)'1GT  -  wL/  where  W  =  R_TGT  .  After  W  is 
computed,  it  should  be  reduced  to  a  pxp  upper  triangular  matrix  K  by 
orthogonalization.  The  matrix  equation 


K  K\  =  Gz-h 

should  be  solved  by  the  obvious  method.  Finally,  one  computes 
x  =  z-(ATA)_1  CTX 

T  - 1  T  ~  i 

whore  (A  A)  G  'X  can  be  easily  computed  by  using  R~ 

It  is  also  possible  to  use  the  techniques  described  in  §7.  Again, 
let  r  =  b-Ax  so  that  from  (B.l) 


I 

-  — 1 

A 

j—  • 

0 

at 

0 

gt 

0 

G 

0 

b 

e 

h 


(3.2) 


18 


or 


Dz  =  g  . 

Note  D  is  an  (m+n+p)x  (rtH-n+p)  matrix.  We  may  simplify  the  solution 
of  (8.2),  however,  by  noting  that 


I 

A 

!  0 

= 

1 

1  ° 

0 

I 

A 

0 

T 

A 

0 

EH°  i 

_ 1 

,  T  1 
A  l 

r  , 

!  ^ 

0 

_ 

0 

R 

-B 

0 

G 

0 

- 

0 

|  bt| 

1 

l _ 

0 

L 

0 

S 

where  B  =  (GR_1)T  =  PS  and  PTP  =  I  with  S  :  ^  .  The  decomposition 
(8.3)  can  be  used  very  effectively  in  conjunction  with  the  method  of  iterat 
refinement.  BjftRCK  and  G0LU3  [1967]  have  given  a  variant  of  the  above 
procedure  which  requires  Q  and  P  . 


9.  Linear  least  squares  solutions  with  inequality  constraints 

Again  let  A,G  be  given  real  matrices  of  orders  mxn  ,  pxn  ,  with 
m  >  n  ,  and  let  b  ,  h  be  given  real  vectors  of  orders  m  ,  p  .  For  any 
vector  x  we  define 

r  =  b-Ax 

and  we  wish  to  determine  an  x  such  that 
T 

r  r  a  mm. 

subject  to 

Gx  >  h 

Our  problem  can  therefore  be  stated  as  follows:  find  r  ,  x  ,  w  such  that 
r  +  Ax  =  b 
Gx  -  w  =  h 

w  >  e 
T 

r  r  *  min. 
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These  problems  can  be  solved  by  quadratic  programming  but  we  present 
an  algorithm  in  this  section  which  may  lead  1o  a  much  smaller  system  o)' 
equations  and  which  yields  highly  accurate  results. 

If  we  define 


ITT  T 

f(r,w,x,y,z)  =  g  r  r  -  y  (r4Ax-b)  -  z  (Gx-w-h) 

where  we  require  without  loss  of  generality  that  z  >  0  ,  then  an  equivalent 
problem  is  to  determine  r,w, x,y,z  such  that 


and 


w,  z  >  0 

f  is  stationary. 


Equating  tc  zero  the  partial  derivatives  of  f  with  respect  to  r,x,y,  z 
respectively,  we  get 

r  -  y  =0 
T  T 

-Ay  -  Qlz  =  0 
r  +  Ax  -  b  =  0 


Gx  -  w  -  h  =  0 


Further,  let  the  elements  of  w,z  be  w.,2^  (i  =  1,2, ...,p)  .  Then 

df  _  z 
dw.  i 

l 

’cw  if  w  >  0  in  the  optimal  solution,  the  constraint  w.  >  0  is  not 

1  l  — 

-indi.n  •  and  we  have 


i.  e. , 

w.  >  0  =>  z^  =  0  . 

Since  z^  >  0  ,  this  further  means  that 
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0  . 


z.  >  0  =>  w. 

1  i 


(For  otherwise,  z.  >  0  =>  w.  >  0  =>  z.  =0  wh '.ch  is  a  contradiction.) 

ill 


Accordingly,  our  problem  has 

become  one  of  finding  a 

solution  of  the 

system 

r  +  Ax 

=  b 

(9-1) 

T  T 

A  r  +  G  z 

=  0 

(9-2) 

Gx  -  w 

=  h 

(9-3) 

such  that 

T 

z  >  6  ,  w  >  Q  ,  zw  =  0 

We  now  determine  an  orthogonal  matrix  Q  and  an  upper- triangular 
matrix  R  such  that 


A  =  OF  , 


where  R  is  uxn  and  non-singular  if  rank(A)  =  n  .  Then 


T  T  T  T 

A  A  =  R  Q  QR  =  H  R 


“1  T 

Letting  B  =  (GR  )  and  eliminating  r  from  (9.l)  and  (9.2)  it  is  easily- 
verified  that 


x  =  x  +  R  ■'‘Bz  ,  (9-M 

where 

x  =  (R^)-1  ATb 

is  the  unconstrained  least  squares  solution  (i.e.,  the  solution  of  (9. l)  and 
(9.2)  with  z  =  Q  )•  x  is  found  by  the  methods  of  §7 ■ 

We  now  determine  if  x  satisfies  the  original  inequalities:  if  we 
define  q  =  G5t-h  and  find  that  q  >  Q  then  the  constraints  are  satisfied 
and  x  solves  the  problem. 

Otherwise,  we  substitute  (9-1*)  in  (9-3)  and  obtain 
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G(x  +  R  ^Bz)  -  w  =  h 


B  Bz  +  q  =  w 

where  we  further  require  >  (9-5) 

z  >  0  ,  w  >  @  ,  z^w  =  0 

~  ~  “  ~  j 

Thus  we  find  that  z,w  solve  the  linear  complementarity  problem  (l£P) 
defined  by  (9.5)-  This  is  a  fundamental  mathematical  programming  problem, 
and  several  algorithms  have  been  developed  for  finding  solutions  (e.g.  see 
LEMKE  [1968],  COTTLE  [1968],  COTTLE  and  LANTZIG  [1968]).  The  matrix  M  =  BTB 
is  positive  semi-definite,  and  this  is  one  of  the  cases  when,  for  example, 
the  principal  pivoting  method  in  COTTLE  [1968]  guarantees  termination  with 
a  solution,  or  with  an  indication  that  none  exists. 

Once  z  has  been  found  it  would  be  a  simple  matter  to  substitute 
into  (9-l)>  (9.2)  and  find  r,x  from 


r  +  Ax  =  b 


T 

=  -G  z 


In  practice,  however,  if  we  are  concerned  with  the  accuracy  of  our  estimate 
of  x  we  use  the  solution  of  the  LCP  (9*5)  only  to  determine  which  elements 
of  w  are  exactly  zero.  These  are  the  w^  which  are  non-basic  in  the 
solution  of  (9. 5).  (There  is  certainly  at  least  one  such  w^  ,  for 
otherwise  we  would  have  z  =  9  ,  w  >  0  ,  which  is  the  case  checked  for 
earlier  in  determining  whether  or  not  x  solved  the  problem.) 

We  now  delete  from  (9«3)  those  constraints  for  which  w^  is  basic, 
obtaining  an  ixn  system  of  equations 

Gx  =  h 

where  1  <  l  <  p  . 

If  z  is  the  vector  z  with  the  corresponding  elements  deleted,  the 
remaining  step  is  to  solve  the  system 


1 


r  +  Ax 


T  -T 

A  r  +  G  z  =  d 


where  we  are  now  working  with  original  daua  and  can  therefore  expect  a 
more  accurate  solution  than  could  be  obtained  from  (9*6).  We  can  now  apply 
the  methods  of  §8  to  this  system  of  equations. 

The  standard  methods  for  solving  the  linear  complementarity  problem 
enploy  the  elements  of  w  as  the  initial  set  of  basic  variables,  with  all 
elements  of  z  initially  non-basic.  In  general,  it  is  probable  that  only 
a  small  proportion  of  the  inequalities  in  the  original  problem  will  be 
constraining  the  system,  which  means  that  only  a  small  proportion  of  the  w. 
will  be  non-zero.  Hence  it  might  be  expected  in  general  that  only  a  small 
number  of  iterations  (relative  to  p  )  should  be  required  to  bring  some  of 

the  z.  into  the  basis  and  reach  a  feasible  solution. 

1  T 

In  our  particular  form  of  the  problem,  since  the  matrix  M  =  b  B 

has  its  largest  elements  on  the  diagonal,  accuracy  can  be  conserved,  to 

within  the  limits  of  the  error  in  forming  M  ,  by  interchanging  rows 

whenever  a  column  of  M  is  brought  into  the  basis  in  such  a  way  that  the 

diagonal  elements  of  M  become  diagonal  elements  of  the  basis  matrix. 

This  is  easily  done  if  the  LU  decomposition  of  the  basis  is  calculated 

each  iteration  as  in  the  treatment  of  the  simplex  method  by  BARTEI£  [1968] 

and  BARTELS  and  GOLUB  [1969]. 

-IT 

Note  that  B  =  (GR  )  can  be  determined  column  by  column  via 
repeated  back- substitution  on  the  system 

T  T 
R  B  =  G 

The  algorithm  presented  here  can  be  used  for  any  quadratic  programming 
problem  when  a  positive  definite  quadratic  form  is  given.  Suppose  we  wish 
to  determine  an  x  such  that 


T  T 

x  Cx  +  d  x  =  min. 


subject  to  Gx  >  h 
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Since  C  is  positive  del’inite,  we  may  write 


T 

=  R  R 


where  R ( ^\))  is  the  Cholesky  factor  of  C  .  Such  a  decomposition  can 

-T 

easily  be  computed.  If  we  now  define  b  =  -  ^  R  d  (and  calculate  b 
from  H  t  =  •  {  d  )  we  find  that 

||b  -  Rx|'^  =  tTb  -  ab^Rx  +  xTRTRx 


T 

=  b '  b  + 


T  T„ 
d  x  +  x  Cx 


Find  consequently  if  we  de  ermine  an  x  such  that 
ijb  -  R:<|'| =  min. 
subject  to  Gx  >  h 

: hen  x  will  satisfy  (9.8)  as  required. 


10.  Singular  systems 


If  the  rank  of  A  is  less  than  n  and  if  column  interchanges  are 
performed  to  maximize  the  diagonal  elements  of  R  ,  then 


A 


(r+1) 


R 

rxr 

S,  > 

(n-r,xr 

0 

0 

when  rank (A)  =  r  .  A  sequence  of  Householder  transformations  may  now  be 

(r+1) 


•‘.r.T'lied  on  the  r ight  of  A 

annihilated.  Thus  dropping  subscripts  and  superscripts,  we  have 


so  that  the  elements  of  S,  ■,  become 

(n-r)xr 


QAZ  =  T  = 


v!k  r*  7  is  an  rxr  upper  triangular  matrix.  Now 


lib  -  Ax|!2  =  ||b  -  QT  T  ZTx||2 

=  lie  -Ty||2 
T 

where  c  =  Qb  and  y  =  Z  x  .  Since  T  is  ol'  rank  r  ,  there  is  no  unique 
solution  so  that  we  impose  the  condition  that  i|x||0  =  min.  But  j|y||0  =  jlxjjrj 
since  Z  is  orthogonal,  and  IlylU  =  Mb’1*  when 


This  solution  has  been  given  by  FADEEV,  et.  al.  [196B]  and  HANSON  and 
lAWSON  [1968].  There  still  remains  the  problem  of  determining  the  rank 
numerically,  and  this  will  be  discussed  in  §12. 


lar  /alue  decomposition 


Let  A  be  a  real,  mxn  matrix  (for  notational  convenience  we  assume 
that  m  >  n  ).  It  is  well  known  (cf.  IANCZOS  [1951])  that 

A  =  UZVT  (ll.l) 

where 


T 

The  matrix  U  consists  of  the  orthonormalized  eigenvectors  of  AA  ,  and 

T 

the  matrix  V  consists  of  the  orthonormalized  eigenvectors  of  A  A  .  The 
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diagonal  elements  of  £  are  the  non- negative  squar-  roots  of  the  eigenvalues 
T 

i..f  A  A  ;  they  are  called  singular  values  or  principal  values  of  A  .  We 
a  s sume 


ct  >  a  >  ...  >  a  >  0  . 

1  —  d  —  —  n  — 

Thus  if  rank  (A )  =  r  ,  c  =  ...  =  Jn  =  C  .  The  decomposition 

(11. l)  is  called  the  singular  value  decomposition  (SVD). 

Let 


A  = 


T 

A  0 


(11.2) 


can  fee  shown  that  the  non-zero  eigenvalues  of  A  always  occur  in  + 
Lrs,  viz. 


\  (A)  =  +  a  (A)  (j  =  l,2,...,r) 

J  d 


(11.5) 


12.  Applications  of  the  SVD 


7he  singular  value  decomposition  plays  an  important  role  in  a  number 
of  least  squares  problems,  and  we  will  illustrate  this  with  some  examples. 
Throne-Lout  this  discussion,  we  use  the  Euclidean  or  Frobenius  norm  of  a 


;  r-{  u  pc  the  set  of  all  nxn  orthogonal  matrices.  For  an  arbitrary 

nxn  real  matrix  A  ,  determine  QeU  such  that 

n 

jA-Qj!  <  |]A-Xj|  for  any  XeUn  • 

T-  hris  been  shown  by  FAN  and  HOFFMAN  [1955)  that  if 
A  =  ICv"  ,  then  =  UVT 
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3)  An  important  generalization  of  problem  A  occurs  in  factor  analysis. 
For  arbitrary  nxn  real  matrices  A  and  B  ,  determine  Qelln  such  that 

||A-BQ||  <  ||A-BX||  for  any  XeUn 


It  has  been  shown  by  GREEN  [1952]  and  by  SCH&NEMANN  [1966]  that  if 
BTA  =  USVT  ,  then  Q  =  UV1 


fk) 

C)  Let  771 v  be  the  set  of  all  mxn  matrices  of  rank  k  .  Assume 

■~rn 

Ae^r^  .  Determine  Bi^^  (k  <  r)  such  that 

(k) 


IMII  <  11A-X|1  for  all  X^' 


It  has  been  shown  by  ECKART  and  YOUNG  [1956]  that  if 


where 


(12.1) 


(12.2) 


Note  that 


||A— b||  =  ||E-fik|| 


+  °P 


.2,1/2 


(12.5) 


D)  An  nxm  matrix  X  is  said  to  be  the  pseudo-inverse  of  an  mxn 
matrix  A  if  X  satisfies  the  following  four  properties: 
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i) 

AXA 

=  A 

ii) 

XAX 

=  X 

iii) 

(ax)t 

=  AX 

iv) 

(xa)t 

=  XA 

We  denote  the  pseudo- inverse  by  A+  .  We  wish  to  determine  A+  numerically. 
It  can  be  shown  (cf.  PENROSE  [1955])  that  A+  can  always  be  determined  and 
is  unique.  It  is  easy  to  verify  that 

A+  =  VAUT  (12.1+) 

where 


In  recent  years  there  have  been  a  number  of  algorithms  proposed  for 
computing  the  pseudo- inverse  of  a  matrix.  These  algorithms  usually  depend 
upon  a  knowledge  of  the  rank  of  the  matrix  or  upon  some  suitably  chosen 
parameter.  For  example  in  the  latter  case,  if  one  uses  (12.4)  to  compute 
the  pseudo- inverse,  then  after  one  has  computed  the  singular  value 
do  ’cmpositicn  numerically  it  is  necessary  to  determine  which  of  the  singular 
values  arc  sere  by  testing  against  some  tolerance. 

Alternatively,  suppose  we  know  that  the  given  matrix  A  can  be 
represented  as 


A  =  B+oB 


where  -,B  is  a  matrix  of  perturbations  and 
||5B||  <  q  . 
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r 


r 

i 


r 


Now,  we  wish  to  construct  a  matrix  B  such  that 

IMI  <  n 


and 


rank  (B)  =  minimum  . 

This  can  be  ac  jmplished  with  the  aid  of  the  solution  to  problem  (c).  Let 


Bk  =  No¬ 


where  SL  is  defined  as  in  (12.2).  Then  using  (12.3), 


B  =  B 


if 


and 


(Vi  +  V+  •••  +  an)1/2  <  n 


(a2  +  a2  +  ...  +  a2)1/2  >  q  , 

x  p  pH-1  n'  1 

Since  rank(B)  =  p  by  construction, 

B+  =  Vfi+UT 
P 

Thus,  we  take  B+  as  our  approximation  to  A+  . 


E)  Let  A  be  a  given  matrix,  and  b  be  a  known  vector.  Determine  x 
so  that  amongst  all  x  for  which  |(b-Ax||^  —  min  ,  ||x||  «  min.  It  is  easy 

to  verify  that 

x  =  A  b 


« 


c 

o 

c 

D 


13 •  Calculation  of  the  SVD 

It  was  shown  by  GOLUB  and  KAHAN  [1965]  that  it  is  possible  to  construct 
a  sequence  of  orthogonal  matrices 
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) 


1 


T 

J  k*=l 


f,  .  '■  n-1 
!  a'k'  ■ 


via  Householder  transformation  so  that 

p(n)p(n-l)^_p(l)AQ(l)0(2)  ^^(n-l)  g  ptAq 


and  J  is  an  mxn  bi— diagonal  matrix  of  the  form 


q  8  0 

1  P1 


a2 


|(m-n)xn 


The  singular  values  of  J  are  the  same  as  those  of  A  . 
singular  value  decomposition  of 

J  =  XZY1 


Thus  if  the 


r.o  that 


T  T 

A  =  PXTY  Q" 


U  =  PX  ,  V  =  QJ 


GOLUB  [1968]  has  given  an  algorithm  for  computing  the  SVD  of  J  ;  the 
algorithm  is  based  on  the  highly  effective  QR  algorithm  of  FRANCIS  [1961,  1962] 
for  computing  the  eigenvalues. 

It  is  not.  necessary  to  compute  the  complete  SVD  when  a  vector  b  is 
given.  Since  x  =  V£  Vb  ,  it  is  only  necessary  to  compute  V,2  and  U^b  j 
noi.f,  this  has  n  strong  flavor  of  principal  component  analysis.  An  ALGOL 
procedure  for  the  SVD  has  beeen  given  by  GOLUB  and  REINSCH  [1969]. 
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14.  Quadratic  constraints 


We  wish  to  determine  x  so  that 


|jh-Ax|L  =  min. 


Pll2  =  “  • 

Such  problems  occur  in  a  number  of  situations,  e.g.  in  the  numerical  solution 
cf  integral  equations  of  the  first  kind  (of.  PHILLIPS  [1962]),  and  in  the 
solution  of  non-linear  least  squares  problems  (cf.  MARQUARDT  [1963]). 

Using  Lagrange  multipliers,  we  are  led  t.o  the  equation 

(ATA-\*I)x  =  ATb 


where  the  real  constant  X*  is  determined  as  the  smallest  root  of 
p  m  T  -P  T 

a  -b  A(A  A-Xl)  A  b  =  0  . 


(14.1) 


Using  the  decomposition  A  =  LEV1,  and  c  =  bFb  ,  equation  (l4.l)  becomes 

P  T  P  -P 
a  -c  z(z  -XI)  s c  =  0  . 

A  combination  of  bisection  and  Newton  iteration  may  be  used  to  determine  X* 
I*  is  easily  shown  that  X*  <  ;T  (cf.  FORSYTHE  and  GOLUB  [1965]). 

It  is  also  possible  to  determine  X*  as  a  solution  to  an  eigenvalue 
problem  using  a  technique  given  by  FORSYTHE  and  GOLUB  [1965].  Consider  the 
iden* ity 


X  Y 


=  dot (X)  det  (W-ZX_  Y) 


wnieh  is  valid  for  any  partitioned  matrix  with  X  and  W  square  and 
del  (X)  0  .  Thus  (l4.l)  is  equivalent  to  the  determinantal  equation 

r(ATA-Xl)2  ATb  1 
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IIow  there  exists  a  vector  p  ar.d  a  number  q 


such  that 

(ATA-Xl)2p  +  ATbq  =  y  ,  bTAp  +  a2q  =  0  . 

A  simple  elimination  shows  that  X*  must  satisfy  the  determinantal  equation 

det[(AiA-Xl)‘i  -  Ci~"  ArblTA]  =  0  .  (l4.L) 

It  is  possible  to  transform  (it. 2)  into  a  2nx2n  ordinary  eigenvalue 
prt  tlc-m. 

Once  X*  is  determined,  the  solution  x  can  be  computed  from  the 
2 YD  of  A  .  Thus, 
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